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ABSTRACT 

A  general  model  of  equipment  performance  as  a  function  of  maintenance 
is  developed  that  permits  quantification  of  the  optimal  level  of  maintenance 
in  terms  of  performance  attainment  and  relative  factor  costs.    The  model 
formulation  is  that  of  a  finite  state,  finite  action  Markovian  decision 
process.    The  report  supplies  a  listing  for  a  program  in  BASIC  of  the 
policy  improvement  algorithm  for  finding  a  best  policy.    The  model  will 
help  maintenance  engineers,  building  managers  and  others  responsible  for 
making  decisions  concerning  maintenance  policies  in  selecting  economically 
efficient  levels  of  maintenance  for  elements  of  building  service  equipment. 
The  report  also  contains  an  illustrative  example  applying  the  model  to 
the  maintenance  of  an  air  handling  unit. 

Key  words:    Dynamic  programming;  Economic  analysis;  Energy  conservation; 

Equipment  maintenance;  Markov  decision  process;  Policy  improvement 
algorithm. 
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EXECUTIVE  SUMMARY 

Energy  use  in  the  United  States  relative  to  the  availability  of 
energy  resources  has  reached  such  proportions  that  it  is  regarded  as  cause 
for  national  concern.    An  implication  of  the  energy  shortage  is  that  the  cost 
of  energy  resources  relative  to  the  cost  of  equipment  maintenance  has  risen. 
Better  maintained  equipment  will  use  less  energy  per  unit  output.  Under 
these  conditions  it  is  commonly  profitable  to  increase  the  level  of 
maintenance  of  equipment  used  to  deliver  building  services  above  the  level 
used  at  the  historical,  lower  cost  of  energy  resources. 

The  primary  purpose  of  this  report  is  to  provide  households  and  firms 
with  a  means  to  reduce  the  operation  and  maintenance  cost  of  their  energy 
using  equipment.    A  further  purpose  is  to  show  how  to  analyze  the  energy 
conservation  effect  of  the  cost  minimizing  policies  derived. 

Economic  evaluation  and  comparison  of  alternative  maintenance  policies 
reouires  consideration  of  factors  beyond  the  immediate  impact  of  individual 
maintenance  actions  like  machine  cleaning  and  lubrication  or  part  replacement 
for  greater  efficiency.    A  complete  costing  of  any  policy  requires  that  the 
consequences  of  such  maintenance  actions  for  the  long  term  future  performance 
of  the  equipment  be  taken  into  account. 

To  make  an  adequate  comparison  of  the  economic  performance  of 
different  maintenance  policies,  technological  and  cost  data  are  needed. 
It  is  also  necessary  to  have  a  method  for  analyzing  and  evaluating  the 
■implications  of  these  data  for  the  present  values  oF  alternative  maintenance 
pol icies . 

In  many  cases  future  energy  consumption  by  various  units  of  equipment 
and  the  results  of  maintenance  actions  on  their  energy  utilization 


V 


are  not  known  with  certainty.    The  report  presents  a  method  for  decision 
making  to  be  applied  in  a  stochastic  environment  in  which  only  the 
probability  distribution  of  these  values  are  known. 

The  perspective  of  the  firm  or  household  faced  with  maintenance 
decisions  is  considered.    The  report  derives  policies  that  will  minimize 
costs  that  these  units  can  be  expected  to  take  into  account.    It  is  thus 
principally  the  perceived  costs  that  firms  and  households  have  to  pay 
i'or  energy  consumed  and  for  equipment  maintenance  for  which  the  report 
supplies  an  analysis. 

The  report  contains  listings  of  computer  programs  that  will  enable 
the  person  responsible  for  formulating  maintenance  policies  to  select  a 
policy  that  will  minimize  the  expected  present  value  of  future  costs. 
This  is  the  incentive  that  is  expected  to  motivate  him  (or  her)  to 
implement  an  optimal  policy. 

There  is  no  attempt  in  this  report  to  assess  possible  inducements 
for  energy  conservation  not  inherent  in  the  price  mechanism.    The  report 
does,  however,  include  a  discussion  of  how  to  estimate  the  energy  conservation 
effect  at  the  micro-economic  level  (the  firm  or  household)  of  an  economically 
responsive  maintenance  policy  by  such  a  unit. 

The  method  presented  for  deriving  optimal  equipment  maintenance 

policies  can  be  implemented  with  current  computer  hardware  and  software. 

An  illustrative  example  is  worked  out  within  the  report.    The  analysis 
and  programs  are  developed  for  conditions  in  which  it  is  assumed  that 

Relative  prices  will  remain  unchanged.    However,  the  report  also  shows 

how  to  modify,  for  a  case  in  which  energy  prices  are  expected  to  be 

increasing,  the  methods  and  computer  programs  supplied. 
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1 .  INTRODUCTION 

1 .1  Background 

The  cost  of  energy  resources  relative  to  the  cost  of  equipment 
maintenance  has  risen.    Better  maintained  equipment  will  use  less  energy 
resources  per  unit  output.    Under  these  conditions  it  is  sometimes 
profitable  to  increase  the  level  of  maintenance  of  equipment  used  to 
deliver  building  services  above  the  level  at  the  historical,  lower  cost 
of  energy  resources. 

Economic  evaluation  and  comparison  of  alternative  maintenance 
policies  requires  consideration  of  factors  additional  to  the  immediate 
impact  of  actions  like  machine  cleaning  and  lubrication  or  part  replacement 
for  greater  efficiency.    A  complete  costing  of  any  policy  requires  that 
the  consequences  of  such  maintenance  actions  for  the  long  term  future 
performance  of  the  equipment  be  taken  into  account. 

To  make  an  adequate  comparison  of  the  economic  performance  of 
different  maintenance  policies,  we  need  technological  and  cost  data  as 
well  as  a  method  for  analyzing  and  evaluating  the  implications  of  these 
data  for  the  present  values  of  the  costs  of  alternative  maintenance 
pol icies. 

1 .2  Purpose 

The  general  purpose  of  this  paper  is  to  provide  firms  and  households 
with  a  means  for  making  more  effective  use  of  energy  and  maintenance 
resources.    The  steps  used  in  the  report  to  achieve  this  purpose  are: 
(1)    to  explain  and  illustrate  with  an  example  a  method  for  modeling  the 
performance  of  building  service  equipment  as  a  function  of  maintenance 
policies,  (2)    to  demonstrate  a  format  in  which  the  technological  and 
cost  data  concerning  equipment  performance  and  maintenance  actions  can 
be  filed  within  a  computer  for  use  in  deriving  and  evaluating  maintenance 
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policies,  (3)    to  explain  how  to  evaluate  the  expected  present  value  of 
future  costs,  (4)    to  explain  a  method  for  finding  a  maintenance  policy 
that  minimizes  the  present  value  of  future  costs,  and  (5)    to  present 
the  listing  of  a  computer  program  in  BASIC  to  achieve  the  optimization. 

The  paper  is  intended  primarily  for  planners,  decision  makers,  and 
researchers  in  the  area  of  equipment  maintenance.    Based  on  their 
knowledge  of  the  equipment  to  be  serviced  they  can  formulate  its  description 
and  use  the  analysis  and  programs  presented  in  the  report  to  evaluate 
alternative  policies  and/or  to  find  a  policy  that  will  minimize  the 
present  value  of  expected  future  costs. 
1 . 3    Scope  and  Organization 

The  assumptions  concerning  the  structural  properties  of  the  system 
being  modeled  are  explained  in  Chapter  2.    Basic  technological  and 
economic  characteristics  of  Markov  chain  processes  and  some  implications 
of  these  characteristics  are  discussed. 

Chapter  3  describes  stationary  policies  and  shows  how  to  find  the 
expected  present  values  of  their  costs.    In  Chapter  4  an  optimization 
Pi^ocedure  for  Markovian  decision  processes  is  explained.    In  Chapter  5 
the  effect  of  different  fuel  prices  on  costs  and  consequently  the  relative 
merits  of  different  maintenance  policies  is  described  in  terms  of 
Comparative  statics.    In  this  section  it  is  also  shown  how  to  apply  the 
policy  evaluation  routine  and  optimization  method  to  the  case  of  rising 
rather  than  constant  energy  prices. 

Chapter  6  (by  James  Kao)  is  an  application  of  the  model  of  the 
previous  sections  to  the  maintenance  problem  for  an  air  handling  unit. 
Chapter  7  summarizes  the  paper  briefly.    It  also  suggests  areas  in  the 
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economics  of  equipment  maintenance  for  energy  conservation  which  require 
further  research. 
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2.    FINITE  STATE-ACTION  MARKOVIAN  DECISION  MODELS 
This  chapter  gives  some  information  on  the  model  that  will  be  used 
to  enable  firms  and  households  to  make  more  efficient  maintenance  decisions. 
First  it  describes  the  nature  of  the  finite  state  stochastic  processes 
under  consideration.    In  particular  a  stationarity  assumption  for  the 
processes  is  formulated.    Then  the  sequential  character  of  the  decision 
making  procedure  that  can  be  used  is  discussed.    Stationary  policies  are 
defined.    The  section  closes  with  the  remark  that  future  costs  are 
discounted  to  get  their  present  value.    Thus  different  sequences  of 
costs  can  be  compared  and  ranked. 
2.1    Markov  property 

By  a  system  we  mean  any  set  of  pieces  of  equipment  and  the  technology 
governing  the  behavior  of  the  equipment  over  time.    Specific  examples  of 
systems  are  a  heating  system  or  an  air  conditioning  system.    A  state  is 
a  possible  condition  of  the  system  and  a  maintenance  controller  is 
someone  who  decides  which  of  the  alternative  maintenance  actions  should 
be  taken  concerning  the  system.    While  more  general  models  in  roughly 
the  same  framework  are  possible,  for  simplicity  we  restrict  consideration 
to  a  system  that  is  observed  at  equally  spaced  time  intervals.  Upon 
examination  the  system  is  found  to  be  in  one  of  a  given  finite  number  of 
states.    After  observation  the  maintenance  controller  takes  one  of  a 
finite  number  of  specified  actions. 

Let  Si,  i=l,  .  .  .  D  denote  the  possible  states  of  the  system  and 

A^,  j=l,    Ai  denote  the  various  actions  that  can  be  taken  in 

state  i.    When  the  system  is  in  state  Si  and  action  a1  is  taken  then  two 
things  occur  as  a  result:    (1)  a  cost  whose  expected  value  is  R(Si ,  a1) 
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is  incurred,  and  (2)    at  the  time  the  system  is  next  observed  it  will  be 

in  each  of  the  states,  Sk,  with  probability  Pi  k  (A^. )• 

3 

Note  that  the  cost  immediately  incurred,  R{Si,  a1),  depends  only  on 

3 

the  current  state-action  pair.    In  particular,  both  the  costs  incurred 
and  the  transition  probabilities  associated  with  each  state-action 
pair  do  not  depend  on  the  calendar  date  at  which  the  event  occurrs,  except 
as  this  is  incorporated  in  the  state  description.    Also  they  do  not  depend 
on  the  history  of  the  system  prior  to  the  present,  except  as  this 
history  resulted  in  the  system  being  in  the  current  state. 

The  first  of  the  above  properties  is  called  stationarity .  The 
second  is  the  Markovian  property.    Both  are  significant  for  developing 
decision  procedures  for  such  systems. 
2.2    State-Action  Pairs 

This  section  presents  additional  details  on  the  state-action 
description  of  the  equipment  maintenance  model.    The  computer  listing  for 
a  file  of  data  describing  a  hypothetical  piece  of  equipment  is  given. 
Stationary  policies  are  then  defined  and  a  computer  output  illustrating 
policy  specification  is  presented. 
2.2.1    Description  of  State-Action  Pairs 

As  indicated  above  in  the  Markov  model  of  equipment  maintenance  the 
following  are  specified: 

(a)  a  set  of  possible  states  of  the  system, 

(b)  for  each  state  a  set  of  possible  maintenance  actions,  including 
possibly  the  instruction  "do  nothing", 

(c)  for  each  allowable  state-action  pair,  the  expected  cost  that  will 
be  incurred  over  the  coming  time  interval, 

(d)  for  each  current  state-action  pair,  the  probability  that  the  system 
will  be  in  each  of  the  possible  states  of  the  system  at  the 


start  of  the  iimediately  following  period. 

Table  1  is  a  computer  listing  of  the  data  specified  for  a  hypothetical 
piece  of  equipment  with  nine  possible  states.    For  reference  the  hypothetical 
equipment  will  be  called  MAC9A. 

Table  1 .    Listing  for  MAC9A 
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The  interpretation  of  the  listing  in  Table  1  is  as  follows: 
The  number  of  possible  states  of  the  equipment  is  given  by  the 
first  integer  in  line  100  while  the  second  gives  the  number  of  state- 
action  combinations  possible. 

(a)  the  next  to  last  entry  in  each  row  after  line  100  specifies  by  an 
integer  the  state  of  the  system  in  accordance  with  a  nomenclature 
conventionally  specified  for  the  equipment  and  the  program, 

(b)  the  last  entry  in  each  such  row  specifies  by  an  integer  a  possible 
maintenance  action  when  the  system  is  in  the  state  indicated  according 
to  (a):    a  given  integer  may  have  different  interpretations  when 
indicating  actions  in  different  states, 

(c)  the  entry  third  from  the  last  in  each  of  these  rows  gives  the 
expected  cost  in  dollars  over  the  next  time  interval  for  the  state- 
action  pair  identified  in  accordance  with  (a)  and  (b),  and 
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(d)    the  nine  first  entries  in  each  row  specify  the  probability  that  the 
system  at  the  immediately  following  period  will  be  in  each  of  its 
ntne  possible  states  (entry  one:    probability  that  the  system  will 
be  the  state  with  label  one,  entry  two:    probability  that  the 
system  will  be  in  the  state  with  label  two...)  given  the  current 
state-action  pair  identified  by  the  last  two  entries  in  the  row. 
If,  as  in  the  notation  of  section  2.1,  we  denote  the  number  of  actions 

possible  in  state  Si  by  Ai  and  the  number  of  possible  states  by  D,  then 

the  number  of  state-action  combinations,  i.e.  the  second  number  in  line 

D 

100  of  the  above  file,  is     E  Ai. 

1=1 

2.2.2   Stationary  Policies 

A  policy  for  a  Markov  decision  process  is  a  rule  specifying  for  each 
state  of  the  system  the  action  that  will  be  taken  in  that  state.    In  general, 
the  rule  for  selecting  the  action  may  depend  on  a  number  of  factors  such 
as  the  past  history  of  the  system,  or  calendar  time.    Sometimes  the  rule 
is  in  the  form  of  specifying  a  random  selection  from  several  of  the 
actions    available.    By  a  stationary  policy  is  meant  a  rule  for  selecting 
the  action  to  be  taken  in  each  of  the  states  that  does  not  depend  on  any 
other  factor  than  the  state  of  the  system.    A  non-random  stationary 
policy  is  specified  when  for  each  possible  state  of  the  system  one  of 
the  possible  actions  associated  with  that  state  is  selected  as  the 
action  to  be  taken. 

Consideration  in  this  report  will  be  restricted  to  non-random 
stationary  policies.  It  has  been  shown  elsewhere,  that  for  the  Markov 
progranming  model  with  a  finite  number  of  states  and  actions  when  costs 
and  transition  probabilities  do  not  vary  with  time,  the  class  of  non- 
random  stationary  policies  contains  one  that  is  optimal  over  the  set  of 
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all  possible  policies  J  If  we  find  the  best  stationary  policy  we  will 
thus  have  an  optimal  policy.  Policy  evaluation  is  effected  by  summing 
the  discounted  expected  value  of  future  costs. 

In  Table  2  we  present  a  sample  of  the  printout  of  two  policies  and  the 
results  of  the  policy  evaluations.    The  information  will  be  the  vector  of 
a  policy  and  the  values  of  the  expected  future  costs  of  the  policy 
discounted  to  the  present.  The  index  set  of  the  vectors  is  the  set  of 
different  states  of  the  system.    The  evaluation  is  given  for  each  state 
considered  as  an  initial  state. 

The  printout  takes  the  form  V(i,  j)  =  C,  where  i  takes  on  the 
integer  values  from  1  thru  D,  indexing  the  possible  states  of  the  system. 
The  pair  (i,  j)  indicates  that  in  state  i  action  j  for  that  state  will  be 
taken.    (Recall  that  a  given  integer-  may  have  different  interpretations 
when  indicating  actions  in  different  states.)    C  at  V(i,  j)  is  the 
expected  value  of  discounted  future  cost  when  the  system  starts  in  state 
i  and  the  policy  (k,  j(k)  ),  k  =  1,  .  .  .  D  is  used.    The  sample  computer 
outputs  given  below  refer  to  the  nine-state  piece  of  equipment  of  which 
the  computer  description  is  given  in  subsection  2.2.1.    The  outputs 
yield  evaluations  of  two  different  policies  applied  to  the  maintenance 
of  this  equipment  with  a  discount  factor  of  .97.  This  is  equivalent  to  a 
rate  of  interest  per  period  of  about  3.1%.    But  note  that  we  have  not 
specified  the  length  of  time  that  constitutes  one  period. 


Policy  A: 


1.  .1  ( 

1 

1 

1 

'r'l: 

c  ? 

i 

1  :r: 

.•■'80 . 

u  i: 

1 

1 

704  . 

1 06 

1.  ,1  i; 

4  1 

1 

i  :=: 

707 . 

347 

'...1 1. 

i 

1  -.z: 

710. 

•Jt'c! 

1 .1 1 

i 

i  - 

.'i  i;. 

i=':l 

1,  .1  ( 

r  ? 

i 

■  ~  ) 

bo 

1.  .1  ( 

i 

717. 

4M7 

1 .1 1; 

i 

1  :r: 

i '  i  1.-' . 

See,  for  example,  David  Blackwell,  "Discrete  Dynamic  Programming", 
Annals  of  Mathematical  Statistics,  Vol.  33,  (1962),  pp.  719-726. 
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Policy  B: 


!.  1 
J  1 
!.  1 


C. 


1 


1 


Table  2.    Policy  Values 


It  can  be  seen  that  for  each  initial  state  discounted  expected 
costs  are  smaller  for  Policy  B  than  for  Policy  A. 

Expressed  in  the  same  notation  as  in  the  previous  paragraph,  the 


total  number  of  nonrandom  stationary  policies  is    n  Ai .    A  method  for 


selecting  a  policy  to  minimize  expected  present  value  of  future  costs 
will  be  described  in  chapter  4  "Finding  an  Optimal  Policy." 
2.3    Discounting  future  costs 

If  the  rate  of  interest  is  i  per  unit  time  then  future  expenditures 

and  receipts  are  converted  to  their  present  value  by  being  multiplied  by 

1  _ 

^  ^)  ,  where  m  is  the  number  of  units  of  time  in  the  future  at  which 
the  expenditures  or  receipts  occur.    The  number  -j— ]-J  ""^  called  the 
discount  factor.    It  will  be  denoted  Al .    When  discounted  any  bounded 
sequence  of  costs  sums  to  a  finite  present  value.    The  sum  obtained 
after  discounting  offers  a  basis  for  comparing  different  time  patterns 
of  costs,  and  it  is  future  costs  summed  in  this  manner  that  will  be  used 
to  compare  the  performance  of  alternative  maintenance  policies. 


D 


i=l 
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3.    FINDING  COSTS  OF  STATIONARY  POLICY 
Using  the  formulation  of  the  equipment  maintenance  problem  as  a 
Markov  decision  process,  this  section  shows  how  to  evaluate  a  given 
stationary  policy.  The  process  structure  and  available  data  are  those 
described  in  the  previous  chapter.    Expected  present  value  of  future 
costs  is  the  evaluation  criterion.  For  expository  clarity  we  include  a 
statement  of  the  one  state  case  as  a  basis  for  comparison.    The  chapter 
includes  a  program  listing  for  computing  the  sum  of  expected  discounted 
costs  when  the  process  generating  the  costs  is  a  Markov  chain. 

3.1  Present  Value  of  cost  with  one  state 

To  elucidate  the  evaluation  formula  for  a  system  in  which  there  are 
several  states  we  first  express  the  formula  for  the  present  value  of 
costs  in  the  case  in  which  there  is  only  one  possible  state  for  the 
system. 

The  one  state  situation  might  be  the  case  of  a  unit  that  is  disposable 
and  lasts  only  one  period.    The  choice  is  whether  to  get  for  its  one 
period  use  a  more  efficient,  more  expensive  item  or  one  that  is  cheaper 
but  less  energy  efficient. 

If  we  denote  the  discount  factor  by  Al  (not  to  be  confused  with  the 
use  of  the  symbol  for  an  action  available  in  state  SI),  and  if  R  denotes 
the  cost  that  we  expect  to  incur  in  each  period,  then  the  sum  of  expected 

oo 

discounted  future  costs  is:     z    (Al R  =  (l-Al)"^  R. 

n=o 

3.2  Present  value  of  costs  with  several  states 

The  formula  above  for  present  value  of  discounted  costs  as  the  sum 
of  a  convergent  geometric  progression  generalizes  to  the  case  of  a 
System  governed  by  a  stationary  probability  transition  matrix.  In  the 
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first  two  paragraphs  we  recall  and  focus  part  of  the  development  of  the 
previous  section.    The  final  paragraph  gives  the  matrix  formula  analogous 
to  that  for  a  scalar  given  above  and  a  program  listing  to  effect  the 
computation. 

3.2.1  Single  period  costs 

We  are  now  dealing  with  a  Markov  decision  process  for  which  a 
policy  has  been  specified.    The  development  of  the  system  is  thus  described 
by  a  Markov  chain  and,  given  the  state  of  the  system  at  any  time,  the 
expected  costs  that  will  be  incurred  at  that  time  are  also  specified. 
The  expected  cost  for  a  single  period  as  a  function  of  the  state  of  the 
system  can  be  conveniently  represented  as  a  (column)  vector.  Denote 
this  vector  by  R. 

3.2.2  Transition  probability  matrix 

Once  a  policy  has  been  specified  we  can  read  off  from  the  data  for 
the  system  described  in  section  2.2.1  the  transition  matrix  characterizing 
the  Markov  chain  under  the  policy.    Denote  this  matrix  by  M.    The  conditional 
probability  that  the  system  will  be  in  state  j  in  time  period  n  given 
that  at  time  o  the  system  is  in  state  i  is  the  (i,j) -entry  in  the  matrix 
m"^.    The  expected  costs  incurred  at  time  n  is  by  definition  the  sum  of 
the  products  of  the  probability  of  being  in  state  j  at  time  n  times  the 
expected  costs  incurred  in  state  j,  where  the  sum  is  taken  over  all 
states.    This  cost,  for  each  assumed  initial  state  of  the  system,  can  be 
conveniently  expressed  in  matrix  notation: 

m"-  R. 

3.2.3  Formula  for  present  value  of  costs 

If  we  again  use  Al  to  denote  the  discount  factor  it  follows  from 
the  preceding  paragraph  that  the  expected  present  value  of  costs  incurred 
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over  the  future  is  for  each  given  initial  state  of  the  system: 

V  =    z    (Al)"  •  m"  •  R   ^  ii    (Al)"  •  m")  R. 
n=o  n^o 

(A1)V  tends  to  0  (the  zero  matrix)  as  n  increases.    So  just  as 

with  a  geometric  progression  of  numbers,    z'  (Al )"m"  =  (I  -  A1  •  M)"  , 

->■  ,     ->■  n=o 

and  V  =  (I  -  Al  •  M)"'  •  R. 

In  Table  3  we  give  the  listing  for  a  subroutine  in  BASIC  to  implement 
the  calculation  described  above.  All  matrices  and  vectors  are  dimensioned 
prior  to  calling  the  subroutine. 

Parameter  values  of  the  following  are  also  specified  in  the  calling 
program: 

(1)  M  is  the  Markov  transition  matrix  associated  with  the  specified 
policy, 

(2)  R  is  a  column  vector  the  entries  of  which  are  the  expected 
costs  over  the  next  time  interval  associated  with  the  specified  policy, 

(3)  I  is  the  identity  matrix,  and 

(4)  Al  is  a  scalar,  the  discount  factor  being  used  in  determining 
the  present  value  of  future  costs. 

The  entries  in  the  matrices  T,  U,  and  S  and  the  entries  in  the  vector  V 
are  determined  in  the  subroutine.    T  and  U  are  intermediate  storage 

00 

locations.    S  is  in  effect      z    (Al )"m"    so  that  V  is  in  the  expected 

n=o 

present  value  of  future  costs  under  the  given  policy  discounted  by  the 
factor  Al . 

Table  3.    Listing  for  Policy  Evaluation  Subroutine 

50   MAT   T=( Al  )»M 
60   MAT  U=T-T 
70   MAT   S=  TNV( U) 
30   MAT  V=S*R 

99  RETURN 
7  00  END 
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4.    FINDING  AN  OPTIMAL  POLICY 
This  chapter  describes  the  policy  improvement  iteration  method  for 
optimizing  Markov  decision  processes.    For  expository  clarity  the  case 
Of  one  state  is  first  explained  in  detail.    The  case  of  several  states 
is  then  treated.  A  program  listing  for  effecting  the  optimization  is 
included. 

4.1  One  State  and  Several  Actions 

As  a  basis  for  describing  policy  improvement  iteration  for  Markov 
decision  processes,  this  section  describes  the  way  the  optimization 
procedure  would  work  for  a  one  state  situation.    This  procedure,  which 
is  a  little  stilted  for  the  one  state  case,  is  then  generalized  to  apply 
to  any  finite  state-action  case. 

Reference  is  made  to  the  discussion  in  section  3.1  above  of  policy 
evaluation.    Since  cost  in  a  single  period  is  a  function  of  the  action 
taken,  we  denote  this  cost  as  R  (ai),  with  ai  to  indicate  the  action 
taken.    The  value  of  this  policy,  i.e.  the  sum  of  expected  future  discounted 
Costs,  is: 

V(ai)  =  (1  -  Al)'^  R(ai). 

In  considering  an  alternative  policy,  aj,  compare  the  value  of  the 
old  policy  with  R(aj),  the  one  period  cost  of  action  a j ,  plus  the  value 
of  the  old  policy  discounted  one  period.    In  other  words,  use  the  following 
test  quantity  to  make  a  decision  between  the  old  and  the  new  policy: 

T  E  R  (aj)  +  (AI)  .  V  (ai)  -  V  (ai). 
If  T  <  0,  then  policy  aj  has  a  smaller  expected  cost  than  policy  ai.  If 
T  -  0,  then  policy  aj  does  not  have  smaller  expected  cost  than  policy  ai . 

4.2  System  with  Several  States  and  Several  Actions 

Since  we  are  dealing  with  a  finite  state-action  system,  one  could 
in  principle  evaluate  all  stationary  nonrandom  policies  and  select  a 
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D 

best  one.    However,  the  number  of  such  policies  is    n  Ai .    Thus,  for 

i=l 

example,  if  there  are  10  states  and  2  actions  in  each  state  the  number 

of  those  policies  is  more  than  a  thousand.    We  shall  show  how  the  optimization 

procedure  described  in  section  4.1  generalizes  to  a  system  with 

several  states. 

Reference  is  made  to  chapters  2  and  3  for  a  discussion  of  stationary 
policies  and  their  evaluation.    Assume  that  a  policy  has  been  specified 
and  evaluated.    We  now  wish  to  see  whether  a  better  policy  is  available. 
A  vector  with  components  V(Si)  gives  the  value  of  the  specified  policy. 
In  considering  whether  or  not  there  exists  better  alternative  policies 
it  is  only  necessary  to  examine  the  effect  of  changing  the  action  in  one 
state  at  a  time.    It  was  noted  by  Ron  Howard^  that  if  for  each  state  no 
improvement  in  the  value  of  the  system  can  be  obtained  by  changing  the 
action  in  only  that  state  then  the  policy  being  considered  is  optimal. 
Otherwise  the  new  policy  obtained  by  introducing  the  improving  action  in 
the  one  state  and  keeping  the  actions  in  the  other  states  the  same  will 
result  in  a  better  policy.    Furthermore,  it  is  not  necessary  to  evaluate 
the  new  policy  to  see  whether  the  change  in  action  in  the  one  state  will 
yield  an  improvement  or  not.    Use  of  a  test  quantity  that  we  will  describe 
is  sufficient. 

The  cost  in  a  single  period  is  a  function  of  the  state  of  the 
system  and  the  action  taken.    Denote  this  cost  as  R(Si,  al).    If  a 
policy  {a^. l^'.,  has  been  specified  and  we  wish  to  examine  whether  introducing 
action  aj^  in  place  of  action  al  for  state  si  would  lead  to  a  policy  with 
greater  value,  i.e.  smaller  cost,  it  is  sufficient  to  examine  the  following 
test  quantity: 


See  Howard,  Ronald  A.,  Dynamic  Programming  and  Markov  Processes, 
(Technology  Press,  1960). 
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E  E  R(Si.  aj)  +  (Al)  •    e    Pij(aj)  •  V(Sj)  -  V(Si). 


The    terms  Al ,  R(Si ,  aj^),  VCSi)  have  already  been  defined.  The 

symbol  Pij(a||)  denotes  the  probability  that  if  the  system  is  in  state  Si 

and  action  a|^  is  taken  then  the  system  will  be  in  state  sj  at  the  start 

of  the  next  period.    If  El  <  0  then  the  policy  with  action  aj^  rather  than 

al  taken  in  state  Si  has  smaller  expected  cost  than  the  original  one. 
3 

If  El  -  0  then  the  new  policy  does  not  have  smaller  expected  cost  than  the 
original  one. 

Using  data  described  and  formatted  in  subsection  2.2.1,  the  subroutine 
in  Table  4  can  be  used  to  effect  optimization  based  on  the  above  considerations. 

40  DIM  0(1, 2G) ,£(1,1) 
50  C=0 

60  t'UR  31  =  1  TO  u 

llfl  hAT  Q=Zb.R(l,D) 

120  FOR  K=l  TO  L  ST£P  1 

13fc)  If  ir'(K,U+2)  OSl  TiiLN  270 

14  0  FOR  1=1  TO  u  STEP  1 

150  Q(1,1)=F(K,I) 

160   MtXT  1 

170  MAT  b=Q*V 

180  £fl=  F(K,D+1)+Al*£(l,l)-V(iil,l) 
190  IF  El>-. 00001  THEN  270 
200  C=l 

210  FOR  1=1   TO  D  STEP  1 
220  i»(Sl  ,1)  =F(K,I) 
230  NEXT  I 

240  R(S1,1) =F(K,D+1) 
250  X(Sl,l)=F(K,D+3) 
260  CALL  PEVALl 
270  NEXT  K 
280  NEXT  SI 
290  IF  C=l  THEN  50 
400  RETURN 
450  LND 


Table  4.    Listing  for  Policy  Improvement  Subroutine 
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The  subroutine  operates  as  follows: 

(1)  Starting  with  instruction  60  for  each  state  in  succession,  it 
examines  each  of  the  actions  available  for  that  state  to  see  whether  the 
action  yields  a  smaller  value  of  the  objective  function  than  the  action 
specified  1n  the  given  policy. 

(2)  If  an  action  does  not  yield  a  "sufficient"  (see  instruction  190  for 
the  definition  of  sufficient)  improvement  of  the  cost  function,  the  next 
possible  action  in  the  file  for  the  given  state  is  examined. 

(3)  If  for  a  state  an  action  yielding  an  improvement  is  found,  this 
action  is  introduced  into  the  policy  (instructions  210  thru  250);  the 
value  of  the  new  policy  is  computed  (instruction  260  calls  the  subroutine 
listed  in  subsection  3.2.3  to  do  this);  and  search  for  still  better  actions 
continues  (instruction  270). 

(4)  When  all  actions  in  a  given  state  have  been  examined,  the  same 
procedure  is  followed  for  the  next  state  (instruction  280). 

(5)  If  during  the  steps  (1)  thru  (4)  a  policy  revision  has  taken  place 
(this  would  be  recorded  in  instruction  200),  the  search  for  improvement 
is  repeated  (instruction  290). 

(6)  If  a  search  thru  all  actions  of  all  states  yields  no  "sufficient" 
improvement  of  the  cost  function,  the  program  leaves  the  subroutine  and 
returns  to  the  main  program. 

Appendix  I  contains  a  listing  of  the  subroutines  discussed  in  the 
report  and  of  the  program  used  to  call  the  subroutines  and  to  print  the 
optimal  policy  yielded  by  the  computations. 

It  is  conjectured  that  the  policy  finally  selected  by  this  routine 
will  in  most  cases  of  practical  interest  be  an  optimal  one.  However, 
since  in  step  (2)  of  the  algorithm  (corresponding  to  instruction  190  of 
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the  subroutine)  a  new  policy  is  introduced  only  if  the  improvement  of 
the  cost  function  is  greater  than  .00001  there  exists  the  possibility 
that  a  potential  improvement  not  larger  than  this  would  go  undetected. 

However,  we  can  place  an  upper  bound  on  the  amount  by  which  the 
policy  finally  decided  on  by  the  routine  exceeds  the  minimum  possible 
cost  function.  Set  D(V)  as  the  quantity  used  to  test  whether  a  sufficiently 
large  improvement  is  possible  or  not.    (The  subroutine  above  uses  D(V)  = 
-.00001).    Let  V*  be  the  value  of  a  cost  minimizing  policy,  and  U  a 
vector  with  components  all  1.    We  can  assert  that  the  value  V  of  a 
policy  obtained  by  means  of  the  subroutine  listed  above  will  satisfy  the 
inequality 


V 


In  other  words,  V*  -  V  + 


D(V) 


U. 


See  Theorem  1,  p.  167  of  E.V,  Denardo,  "Contraction  Mappings  in 
the  Theory  Underlying  Dynamic  Programming",  SIAM  Review,  Vol.  9,  (1967). 
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5.    MARKOVIAN  MODELS  OF  EQUIPMENT  MAINTENANCE 
Chapter  2  described  finite  state-action  Markovian  decision  models. 
Chapter  3  showed  how  to  evaluate  a  policy  in  such  a  system,  and  chapter 
4  showed  how  to  find  an  optimal  policy.    Chapter  5  will  apply  the  model 
to  the  problem  of  equipment  maintenance.    The  trade-off  between  energy 
utilization  and  the  use  of  other  resources  is  brought  into  focus.  The 
response  to  price  signals  is  indicated  through  a  study  of  comparative 
statics.    The  section  also  shows  how  to  apply  the  policy  evaluation 
routine  and  optimization  method  to  a  situation  of  rising  energy  prices. 
5.1    Operation  and  Maintenance  Cost  per  Period 

Chapter  2,  "Finite  State-Action  Markovian  Decision  Models",  describes 
the  basic  data  needed  for  a  Markov  programming  model  of  an  equipment 
maintenance  problem.    The  chapter  also  presents  a  computer  file  format 
for  keeping  the  data  available  for  computer  analysis  in  the  mathematical 
programming  operations  needed  to  develop  acceptable  policies.    A  piece 
of  equipment  is  described  in  the  format  mentioned  by  a  table  of  numbers. 
The  table  consists  of  L  rows,  where  L  is  the  number  of  state-action 
pairs  possible  for  the  piece  of  equipment,  and  D+3  columns,  where  D  is 
the  number  of  possible  states  of  the  equipment.    In  each  row  corresponding 
to  a  state-action  pair  the  entry  in  column  D+1  is  the  expected  cost  over 
the  current  time  period  when  the  system  is  in  the  given  state  and  the 
action  specified  in  the  given  pair  is  taken.    For  the  equipment  maintenance 
model  the  costs  incurred  are  of  two  kinds: 

(1)  cost  of  labor  and  parts  for  maintenance,  and 

(2)  expected  cost  of  the  energy  consumed. 

Denote  the  cost  in  the  current  period  for  the  state-action  pair 
(i,k)  by  Ri(k).    Then,  Ri(k)  =  Mi(k)  +  P  .  Qi(k)  where  Qi(k)  is  the 
amount  of  energy  (in  physical  units)  that  the  equipment  is  expected  to 
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use  during  the  period  in  state  i  if  maintenance  action  k  for  that  state 

is  used,  P  is  the  price  of  energy  and  Mi(k)  is  the  amount  of  other 

costs,  expressed  in  dollars,  if  maintenance  action  k  is  applied  in  state  i. 

The  program  described  in  chapter  4,  "Finding  an  Optimal  Policy", 
and  listed  in  Appendix  I,  generates  in  succession  policies  of  lower  cost 
until  it  no  longer  detects  any  possible  cost  reduction.    In  effect, 
within  a  certain  range  the  more  valuable  the  price  indicator  shows 
energy  to  be,  the  more  will  energy  be  conserved  by  policies  to  reduce 
cost. 

5.2   Substitution  of  maintenance  for  energy  resources. 

In  attempting  to  minimize  cost  in  situations  in  which  energy  prices 
are  higher  it  is  reasonable  to  assume  that  policies  employing  additional 
maintenance  in  place  of  more  energy  will  be  used.^    However,  when  a 
small  change  in  energy  prices  leads  to  a  change  in  maintenance  policy, 
the  cost  reduction  at  the  new  price  resulting  from  the  policy  change 
will  be  small  (not  exceeding  the  order  of  magnitude  of  the  price  change). 
The  present  value  of  a  policy  is  a  vector  with  a  component  defined  for 
each  of  the  states  of  the  system  as  the  starting  state.    The  graphs  of 
Figure  1  can  be  taken  as  of  one  given  state. 


Arithmetic  examples  can  be  devised  that  will  have  each  cost  minimizing 
policy  for  a  higher  energy  price  entail  larger  average  per  period  energy 
consumption.    Thus  it  is  not  a  mathematical  consequence  of  the  assumptions 
described  in  the  previous  chapters  that,  when  the  price  of  energy  rises, 
energy  utilization  defined  as  expected  energy  use  per  period  will  not 
rise.    These  examples  would  illustrate  why  Paul  Samuel  son  developes  an 
argument  to  prove  that  such  phenomena  do  not  occur  in  the  production 
function  he  is  discussing.    ("Although  my  intuition  is  poor  enough  in 
three  dimensional  space,  I  can  assert  with  confidence  on  the  basis  of 
the  above  that  raising  any  input's  price  while  holding  all  remaining 
inputs'  prices  constant  will  definitely  reduce  the  amount  demanded  of 
that  input  by  the  firm-i.e.,  9Vi_   <  0,"  Paul  Samuel  son, 

9  Pi 

"Maximum  Principles  in  Analytical  Economics",  The  American  Economic  Review, 
(1972),  Vol.  62,  No.  3,  p.  253).    Economic  judgement  suggests  making  the 
additional  assumption  formulated  in  the  text. 


P  P-.  P„         Energy  Prices 


Figure  1.    Policy  costs  as  a  function  of  energy  prices 
Figure  1  illustrates  the  cost  effects  of  a  rise  in  energy  prices 
leading  to  a  policy  change.    At  price  P^ ,  Policy  A  has  lower  cost.  At 
price  P^y  Policy  B  has  lower  cost.    At  price  Pq  the  costs  of  the  two 
policies  are  the  same.    When  P^  and  P^  are  both  close  to  Pq  the  difference 
in  cost  between  the  cost  of  Policy  A  and  that  of  Policy  B  will  not  be 
large. 

Suppose  expected  energy  utilization  under  Policy  B  is  smaller  than 
expected  energy  utilization  under  Policy  A.    A  change  from  Policy  A  to 
Policy  B  resulting  from  the  desire  to  use  the  lower  cost  policy  may  save 
only  a  small  amount  in  cost,  but,  it  will  also  conserve  energy  and  the 
energy  conservation  effect  of  the  policy  change  needs  separate  evaluation. 


21 


To  compare  expected  energy  utilization  under  the  policies  appropriate 
respectively  for  energy  prices  P^and  Pg,  we  run  the  policy  selection 
program  for  both  prices.    For  each  policy  (which  is  stationary)  the 
resulting  stationary  Markov  chain  yields  a  stationary  probability  measure 
on  the  states  of  the  system.  Multiplying  the  expected  energy  utilization 
under  the  policy  for  each  state  by  the  probability  of  that  state  and 
summing  yields  the  expected  energy  utilization  per  period  for  the  given 
policy. 

The  subroutine  STATV,  the  listing  of  which  is  given  in  Table  5,  was 
written  to  help  evaluate  the  expected  energy  utilization  resulting  from 
operating  the  equipment  under  a  given  maintenance  policy.  It  calculates 
the  probability  measure  on  the  states  of  the  system  associated  with  a 
given  policy  as  we  demonstrate  in  the  next  paragraph. 

50   MAT  U=I-M 
60   FOR  K=1   TO  D 
70   U(K, 1)=1 
80   NEXT  K 
90   MAT  S=TNV(U) 
200  RETURN 
250  END 

Table  5.    Listing  for  subroutine  STATV 
In  the  terminology  of  Kemeny  and  Snell,  "An  ergodic  chain  is  characterized 
by  the  fact  that  it  consists  of  a  single  ergodic  class,  that  is,  it  is 
possible  to  go  from  every  state  to  every  other  state. Let  M  be  the 
transition  matrix  of  an  ergodic  chain,  modified  by  the  possible  addition 
of  some  transient  states  (states  to  which  one  cannot  go  from  every  other 
state).    If  the  total  number  of  states  in  the  chain  is  D  then  the  matrix 
U  (eI-M)  has  rank  D-1 .    The  column  vector  1  is  linearly  independent  of 
the  columns  of  U,  since  the  fixed  vector  of  M  is  orthogonal  to  each 


John  G.  Kemeny  and  J.  Laurie  Snell,  Finite  Markov  Chains.  (Van  Nostand,  1960). 
pp.  99. 


} 

I 

I 
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column  of  U.    Substituting  1  for  the  first  column  of  U  and  denoting  the 
new  matrix  U*  gives  a  matrix  of  rank  D,  for  only  the  zero  vector  is 
orthogonal  to  all  the  columns  of  the  resulting  matrix.    The  first  row  of 
the  inverse  of  U*  is  the  fixed  probability  vector  of  mJ 
5 . 3    Optimal  Policy  for  Rising  Energy  Prices 

The  models  for  equipment  maintenance  discussed  in  this  report  up  to 
now  are  all  stationary  dynamic  programming  models.    The  stationarity 
assumption  in  particular  means  that  the  models  can  be  used  to  compare 
appropriate  policies  for  different  fuel  prices.    The  discussion,  however, 
does  not  immediately  apply  to  the  non-stationary  situation  in  which  fuel 
prices  are  expected  not  to  remain  constant,  but  to  be  increasing.  This 
section  shows  the  modifications  that  will  permit  the  previous  analysis 
to  find  a  policy  that  will  minimize  the  sum  of  future  expected  costs 
discounted  to  the  present  when  fuel  prices  are  increasing  at  a  constant  rate. 

Let  Al  denote  the  discount  factor  applicable  to  a  dollar  one  period 
in  the  future  to  obtain  its  present  value.    Suppose  that  fuel  prices  are 
increasing  at  the  rate  r  per  period.    Define  A2  =  Al  •  (1+r).    It  is 
assumed  that  A2  <  1 .  Then  the  present  value  of  a  physical  unit  of  fuel  n 
periods  in  the  future  is:    (A2)"  •  P,  where  P  is  the  current  price  of  fuel. 

Suppose  for  the  moment  a  policy  is  specified,  i.e.,  for  each  possible 
state  of  the  system  an  action  has  been  identified  as  the  one  that  will 
be  taken  in  that  state.  Let  MC    be  the  vector  of  expected  maintenance 
costs  and  FC    the  vector  of  expected  energy  costs  at  current  prices, 
each  over  a  single  time  period.    Denote  the  transition  matrix  under  the 
given  policy  by  M. 


See  Denardo,  Eric  V.  "A  Markov  Decision  Problem"  in  Mathematical  Programming, 
edited  by  T.C.  Hu  (Academic  Press,  1973). 
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If  V  is  the  present  value  of  the  future  costs  of  the  given  policy, 
it  can  be  expressed  as: 

V  =  VM  +  VF,  where  VM  is  the  vector  of  the  expected  present  value  of 

maintenance  costs  and  VF  is  the  vector  of  the  expected  present  value  of  energy 

costs.    In  terms  of  the  data  specified  above: 

-1 

VM  =  (I  -  Al  •  M)     •  MC,  and 

^  -1 
VF  =  (I  -  A2  •  M)  '  •  FC. 

The  policy  improvement  routine  can  now  be  implemented  for  this  case 
of  increasing  energy  prices.    Recall  that,  for  each  state,  each  action 
available  for  that  state  is  examined  to  see  whether  it  would  yield  a 
smaller  value  of  the  objective  function  than  the  action  specified  In  the 
given  policy  or  not.    In  testing  action  a^  of  state  i  the  quantity  that 
should  be  examined  is:  ^ 

El  S  MC.  (a*)  +  FC.  (a^)  +  m.  (a^)  [(Al)  x  VM  +  (A2)  x  VF]  -  V(i). 
The  results  of  the  test  are  applied  as  described  above  in  section  4.2, 

Note  that,  if  it  were  of  interest,  a  case  of  a  decreasing  factor 
price  could  be  treated  in  an  anlogous  fashion.^ 


I  wish  to  thank  Stephen  R.  Petersen  for  having  suggested  that  the 
case  of  increasing  energy  prices  be  treated. 
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6.    EXAMPLE  OF  EQUIPMENT  MAINTENANCE  MODEL 

by 

James  Kao 

This  chapter  applies  the  model  described  above  to  the  maintenance 
of  an  air  handling  unit.    The  equipment  is  first  described  in  engineering 
terms.    A  finite  state  description  is  next  given  and  the  possible  actions 
in  each  state  identified.    The  transition  from  state  to  state  under 
alternative  maintenance  actions  is  discussed.    Cost  per  period  as  a 
function  of  state-action  is  specified.    Finally,  an  optimal  policy  for 
each  of  several  energy  prices  is  identified.    In  the  spirit  of  presenting 
an  example  of  the  general  optimization  procedure,  some  remarks  on  the 
special  structure  of  the  example  that  would  permit  use  of  special  methods 
to  obtain  a  minimum  cost  policy  are  relegated  to  Appendix  II  at  the  end 
of  the  report. 

6.1    Filter  for  Air  Handling  Unit 

The  air  filters  for  building  air  filtration  are  well  suited  for 
illustrating  the  use  of  the  equipment  model  to  determine  a  cost  minimizing 
service  policy.    For  medium  and  high  efficiency  air  filters,  with  the 
exception  of  electrostatic  filters,  there  are  usually  air  filter  gauges 
installed  to  indicate  the  filter  air  resistance.    The  air  resistance  can 
be  conveniently  used  to  represent  the  "state"  of  the  air  filter.  The 
state  of  the  equipment  will  be  discussed  further  in  section  6.2. 

Although  the  energy  consumption  due  to  air  filter  friction  is  small 
compared  to  the  energy  required  for  heating  and  cooling  of  the  building, 
it  may  consume  as  much  as  one  quarter  of  the  entire  power  input  of  the 
fan  motor  for  air  distribution.    The  energy  consumption  of  air  filtration 
is  especially  high  for  areas  requiring  high  air  circulating  rate,  better 
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air  quality,  and  long  operating  hours.    Spaces  which  may  have  these 
requirements  are  computer  rooms,  some  areas  in  hospitals  and  laboratories, 

and  some  industrial  plants. 

To  illustrate  the  equipment  model,  a  22,000  CFM,  constant  fan 
Speed,  air  handling  system  is  used.    The  air  filters  in  the  air  handling 
unit  are  8-cell,  80-85%  efficient  (ASHRAE  Atomspheric  Dust  Spot  Efficiency 
rating)^  bag  type  with  an  initial  friction  of  .35"  water  gauge  pressure 
(Wg).    The  filter  bags  are  periodically  replaced  when  they  are  loaded 
and  the  general  practice  is  to  follow  the  filter  manufacturer's  recommended 
replacement  friction,  which  varies  from  approximately  .8"  to  1.0"  WG  for 
this  type  of  filter,  depending  on  the  manufacturer. 
6.2   States  of  Systems 

The  state  of  the  system  may  be  any  variable  condition  of  the  system 
having  a  direct  relationship  with  the  energy  consumption  of  the  system. 
In  many  cases,  the  maintenance  personnel's  judgment  must  be  relied  on  in 
determining  the  state  of  the  system,  although,  it  is  preferred  that  some 
definitive  indicators  be  used.    For  a  steam-to-water  heat  exchanger,  the 
water  pressure  drop  of  the  water  tubes  may  be  used  as  the  state  to 
represent  the  cleanliness  of  the  heat  exchanger.    In  our  example  here, 
the  air  flow  rate  of  the  air  handling  unit  or  the  air  velocity  at  a 
certain  point  inside  the  unit  may  be  used  as  the  state,  but  the  most 
Convenient  state  can  be  expressed  by  the  friction  of  the  filter  as 
displayed  on  the  filter  gauge.    This  pressure  for  our  type  of  filters 
can  be  anywhere  in  the  interval  .35  to  .95  inches  WG.    This  interval  is 


"Method  for  Testing  Air  Cleaning  Devices  Used  in  General  Ventilation 
for  Particulate  Matter  (ASHRAE  Standard  52-68)"  (American  Society  of  Heating, 
Refrigeration  and  Air  Conditioning  Engineers,  Inc.,  1968). 
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divided  into  12  sub-intervals  of  .05"  WG  each  and  each  of  the  sub- 
intervals  is  consolidated  into  a  single  state  for  purposes  of  our 
approximate  analysis. 

51  corresponds  to  .35  -  WG  <  .4 

52  corresponds  to  .40  -  WG  <  .45 

etc . , 

SI 2  corresponds  to  .9  -  WG. 

6.3  Possible  Actions  in  Each  State 

For  the  type  of  air  filters  in  the  example,  the  possible  actions  in 
each  state  are  the  same,  namely,  replacing  the  filters  or  doing  nothing. 
For  the  more  complicated  mechanical  equipment,  the  possible  actions  may 
be  many  and  may  differ  from  state  to  stcite.    For  instance,  the  possible 
actions  for  a  refrigeration  machine  may  include  the  following:  doing 
nothing,  cleaning  condenser  tubes,  cleaning  evaporator  tubes,  overhau- 
ling compressor  or  any  combination  of  these  actions. 

6.4  Transitions  from  State  to  State 

It  is  not  the  intention  of  this  example  to  discuss  in  detail  the 
probability  distribution  of  the  states  at  the  end  of  the  time  intervals. 
For  most  pieces  of  equipment  used  in  buildings  for  there  are  probably 
not  enough  operating  records  on  their  deterioration  rate  to  warrant  a 
thorough  probability  distribution  study.    Sometimes  judgment  must  be 
relied  upon  to  determine  the  state  transition.    For  air  filters,  the 
dirt  loading  is  very  much  dependent  on  the  outside  air  inlet  location, 
the  air  quality  around  the  building  and  the  recirculated  air  quality 
which  varies  with  the  building  occupants'  activities.    A  filter  life 
chart  should  be  constructed  by  observing  the  filter  gauge  at  periodic 
time  intervals.    The  filter  life  chart  can  be  used  to  help  calculate  the 
transition  probabilities.    In  this  illustrative  example,  we  shall  use  a 
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filter  life  curve  of  expected  values  such  as  the  one  shown  in  Figure  2. 
It  should  be  noted,  due  to  the  reasons  given  above,  that  the  life  curve  of 
an  actual  air  filter  in  a  certain  building  may  not  duplicate  this  curve 
exactly.    The  time  interval  constituting  one  period  is  taken  to  be 
a  month  in  this  example. 

Judgment,  based  on  the  operating  person's  experience,  must  be  used 
in  determining  the  possible  state  transition  at  the  end  of  the  time 
interval.    For  example,  at  the  end  of  one  month  after  a  new  set  of 
filters  are  installed,  we  know  that  the  resistance  will  not  be  below 
.35"  WG,  but  will  the  resistance  go  beyond  .45"  WG?    In  other  words, 
should  we  distribute  the  transition  into  two  states  (.35"  WG  and  .40" 
WG)  or  three  states  (.35"  WG,  .40"  WG  and  .45"  WG)?    If  the  judgment  of 
the  operating  personnel  is  that  the  chances  of  being  above  .45"  WG  at 
the  next  inspection  is  nil,  then,  a  two  point  distribution  is  selected. 
The  transition  probabilities  can  be  computed  by: 

.  P^  =  1 

^1-0 
P,^0 

where 

y  =  air  friction  at  certain  states 

P  =  transition  probability  to  these  certain  states 

E  =  the  expected  air  friction  at  the  end  of  inspection  period  obtained 
from  the  life  curve  of  Figure  2. 


Base  curve  from  "Air  Filtration:    Resistance,  Energy  and  Service  Life,"  by 
Robert  Avery,  Heating/Pi  ping/Air  Conditioning ,  December  1973.    Curve  extended  from 
.8"  WG  to  1.1"  WG  resistance  by  fitting  a  polynomial. 
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In  our  example,  at  the  end  of  one  month  after  the  filters  are  replaced: 
.35P,  +  .4OP2  =  .375 
P,  .      =  1 

^1  -  ° 

Therefore, 
?,  =  .5 
P2  =  .5 

If  it  is  decided  that  a  three  point  distribution  of  the  transition  prob- 
ability should  be  made,  such  as  the  transition  from  initial  state  2  then 
we  have 

y,P,  +  ^2^2  +  P3P3  =  E  ., 


P,  .  P2  *  P3  =  1 


P,iO 


P2  -  0 


or 


.4  P,  +  .45  P^  +  .5  P    -  .46 


P,  .  P2  ^  P3  =  1 

?2  -  0 
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The  solution  set  of  this  system  of  inequalities  is  a  line  segment, 
and  the  midpoint  of  the  segment  is: 


In  the  absence  of  better  information  on  the  exact  probability 
distribution  leading  to  the  expected  value  recorded  in  the  chart  we  have 
taken  this  as  an  estimate. 

6.5   Cost  Per  Period  as  Function  of  State-Action 

The  cost  per  period  mainly  includes  the  cost  of  energy  to  operate 
the  equipment  and  the  cost  of  servicing  the  equipment.    Ordinarily,  the 
servicing  cost  consists  of  the  labor  cost  and  the  material  cost  of 
replacing  parts. 

In  the  air  filter  example,  the  energy  consumption  of  the  air  filters 
is  approximcited  by  taking  the  average  of  the  fan  power  at  the  beginning 
and  the  end  of  the  time  interval  multiplied  by  the  number  of  hours  of 
operation  time  during  the  inspection  period  of  one  month.    The  air  flow 
rate  used  in  the  fan  power  computation  is  adjusted  to  reflect  the  changing 
system  and  fan  characteristics  due  to  filter  dust  loading.    For  each 
state,  an  air  handling  system  characteristic  curve  representing  that  for 
the  average  air  filter  resistance  in  the  time  interval  can  be  constructed 
to  intersect  with  the  fan  characteristic  curve  to  obtain  the  air  flow 
rate  of  that  state.    The  filter  energy  consumption  is  then  computed  by 
using  the  equation: 

E  =         Q  x  R  x  T 

8510  X  e    X  e^ 
in  f 
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where 

E  =  energy  consumption  for  the  time  interval,  KWH 

R  =  average  air  filter  resistance  in  the  interval,  in.  WG 

3 

Q  =  air  flow  rate.  Ft  /min 

T  =  air  handling  unit  operating  hours  in  the  time  interval,  hr 

e    =  motor  effieiency 
m 

e^  =  fan  static  efficiency 

Table  6  shows  the  filter  frictions,  air  flow  rates  and  the  filter 
energy  consumption.    The  motor  effiency  was  assumed  to  be  .85  and  the 
fan  efficiency  was  assumed  to  be  .75. 

The  cost  of  replacing  the  air  filters  include  $320  for  filters  and  $15 
labor  cost  for  one  replacement. 

The  possible  actions,  transitions  from  state  to  state  and  the  costs  per 
period  are  listed  in  Table  7. 
6.6    Cost  Minimizing  Policies 

Cost  minimizing  policies  obtained  by  the  optimization  procedure  are 
shown  in  the  computer  output  of  Table  8.    A  discount  factor  of  .99  and 
four  prices  for  energy  were  used. 

The  printout  takes  the  form  V(i,  j)  =  C,  where  i  takes  on  the 
integer  values  from  1  thru  12,  indexing  the  possible  states  of  the 
system.    If  the  j  corresponding  to  a  given  i  is  2,  this  indicates  that 
the  filter  is  to  be  replaced  when  the  system  is  in  that  state.    If  the 
j  Corresponding  to  a  given  i  is  1,  this  indicates  that  the  filter  is  not 
replaced  in  that  state.    C  at  V(i,  j)  is  the  expected  value  of  discounted 
future  cost  when  the  system  starts  in  state  i  and  the  policy  (k,  j(k)  ), 
k  =  1 ,  .  .  .  12  is  used. 
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At  the  energy  costs  of  $.025/KWH,  $.05/KWH,  $.075/KWH,  and 
$.1/KWK,  the  best  policies  are  to  replace  the  filters  at  S12,  S12,  S8, 
and  S6  respectively.    These  results  illustrate  that  at  higher  energy 
cost,  it  pays  to  replace  the  filters  more  frequently. 
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7.    Summary  and  Suggestions  for  Further  Research 
This  paper  explains  a  method  for  modeling  the  performance  of  building 
service  equipment  as  a  function  of  maintenance  policies.    It  shows  how 
to  use  the  model  to  optimize,  by  a  correct  choice  of  maintenance  policies, 
the  equipment  performance  with  special  attention  to  energy  costs. 

Technological  and  economic  assumptions  are  explained.    Chapter  3 
discusses  the  class  of  policies  that  will  be  investigated  and  references 
literature  showing  that  this  class  contains  an  optimal  policy.    Then  an 
optimization  procedure  is  explained.    The  comparative  statics  of  maintenance 
policies  for  different  fuel  prices  is  examined;  it  is  also  shown  how  to 
apply  the  policy  evaluation  routine  and  optimization  method  to  the  case 
of  rising  energy  prices.    Chapter  6  is  an  application  of  the  model  of 
the  previous  sections  to  the  maintenance  problem  for  an  air  handling 
unit. 

The  paper  formulates  equipment  maintenance  for  energy  conservation 
as  an  economic  problem  and,  subject  to  the  assumptions  explained  in  the 
report,  solves    the  problem  of  optimization.    But  this  is  not  an  exhaustive 
study  of  the  problem.    The  following  tasks  appear  timely  and  significant: 

(1)  The  analysis  of  this  report  is  based  on  predetermined  periodic 
examination  of  the  equipment  to  ascertain  its  state.    The  appropriate 
examination  procedure  and  its  frequency  should  be  the  subject  of  investigation. 
Tradeoffs  between  better  focused  maintenance  policies  and  less  intensive 
and/or  frequent  equipment  inspection  will  often  be  available. 

(2)  The  data  required  (fully  described  in  the  report)  to  implement 
the  model  are  not  immediately  available  for  many  units  of  building 
service  equipment.    Therefore,  a  program  for  systematic  collection  and 
analysis  of  data  concerning  performance  of  equipment  and  the  effects  of 
maintenance  actions  on  equipment  performance  should  be  undertaken. 
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(3)  Appropriate  statistical  procedures  for  estimating  parameters 
characterizing  equipment  performance  should  be  investigated. 

(4)  Methods  should  be  formulated  for  decision  making  with  respect 
to  maintenance  policies  on  the  basis  of  accumulating  data  with  respect 
to  structure  and  parameters  of  the  model . 
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Appendix  I.    Listing  for  Programs 
In  this  appendix  we  give  a  listing  of  each  of  the  programs  described 
in  the  text.    In  some  cases  we  duplicate  a  listing  already  given  in  the 
main  body  of  the  report  for  the  completeness  of  the  appendix. 

POELFR 

10   Al=  .99 

fo   DUi  r(25;"',,HU5.25).SU5  25..T(25..5,,U(25,a5, 

40    DIM   H(25,1)  .V(  25,1  )  ,X(  25  ,  1  ) 

50   FILho  MAC9A 

60    READ  # 1 , D  ,  L 

65   MAT  F=ZER(L,D+3) 

70   MAT  M=ZER(D,D) 

^0   MA"^   S=ZER  (D,D) 

90   MA-  ':'=ZER(D,D) 

100    MAT  U=ZER(D,D) 

110    MAT  R=ZEH(D,1) 

120  MAT  V=ZER(D,1) 
•      130    MAT  X=ZER(D,1) 

140   MAT  I=IDN(D,D) 

150   MAT   READ   # 1 ,F 

160   CALL  BLMM 

170    CALL  PEVALl 

]iy°'   ''-\1%T   ..V(..t","X(I,1)")  =  "V(I,l) 

200    NEXT  I 

2  10  INT 

220    CALL  HTES-T 

p^G    FOR    S=1    TO  D 

2^0  PHINT   ..V("SH,"X(S.1)-)  =  "V(S,1) 

250   NEXT  S 
260  END 

Table  Al .    Listing  for  Calling  Program 
Table  Al  is  the  listing  for  POELFR  which  is  a  program  to  identify 

the  discount  factor  being  used  [In  10],  to  call  up  the  analyzing  subroutines, 

and  to  print  the  results  of  the  analysis. 

Table  A2  is  the  computer  listing  of  the  data  for  a  hypothetical 

piece  of  equipment.    The  interpretation  of  the  listing  in  terms  of  the 

physical  and  economic  characteristics  of  the  equipment  is  given  in 

subsection  2.2.1  of  the  report. 

The  subroutine  BLMM  is  called  in  the  program  POELFR  [In  160].  A 
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listing  for  this  subroutine  is  given  in  Table  A3.    The  purpose  of  this 
subroutine  is  to  specify  a  starting  policy  preliminary  to  the  policy 
improvement  routine. 
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Table  A2.    Listing  for  MAC9A. 


BLMM 

30   FOR    T=1    TO   D   STEP  1 
40   FOR   K=1    TO   L   STEP  1 
50   IF   F(K,D+2)<>I   THEN  120 
60   FOR   J=1    TO   D   STEP  1 
70   M( T, J)  =  F(K,  J) 
80   NEXT  J 

90  X(T,1)=F(K,D+3) 

100  R(I,1)=F(K,D+1) 

no   GO   TO  130 

120   NEXT  K 

130    NEXT  T 

150  RETURN 

200  END 


Table  A3.    Listing  for  BLMM 


40 


Once  an  initial  policy  has  been  specified,  the  program  calls 
subroutine  PEVALI  [In  170]  to  evaluate  the  policy.    The  listing  for 
PEVALI  is  given  in  Table  A4.    The  operation  of  this  subroutine  is  discussed 
in  subsection  3.2.3  of  the  report. 

The  program  POELFR  then  calls  subroutine  RTEST  [In  220],  the  policy 
improvement  subroutine.    The  operation  of  this  subroutine  is  explained 
in  subsection  4.2.    The  listing  for  RTEST  is  given  in  Table  A5. 

In  conclusion.  Table  A6  gives  the  listing  for  a  subroutine,  STATV, 
to  calculate  the  fixed  probability  vector  associated  with  a  given  policy. 
Justification  of  the  algorithm  used  in  STATV  is  given  in  subsection  5.2. 

PLVAL1 

5  0    MAT  -.(AD*!! 
60   MAT  U=T-T 
70   MA"  3=INV(U) 
do    MA"  V=S«R 

99  RETURN 
1  0  0   E  rj  D 

Table  A4.    Listing  for  PEVALI 

RTE3T 

40DIMC(1, 20), E(1,l) 
5C  C=0 

6CF0RS1=1T0D 

no   MA"  Q=ZEH{1,D) 

120    FOR    K=1    "0   L   STEP  1 

130    IF   F(K,D+2)<>S1    THEN  270 

140   FOR    1=1    TQ   D  STEP  1 

1  5  0   Q  (  1  ,  I )  =  F  (  K  ,  I ) 

160   NEXT  T 

170   MAT  E=Q*V 

IdO    El=    F( K , D+1 )+Al»E(  1  ,  1  )-V( 31  ,  1  ) 
•  190    IF   El>-.  00001    THEN  270 

200  C=l 

210    FOR    1=1    TQ   [)   STEP  1 
220    H( 31  , I)  =  F( K  ,  I) 
230    NEXT  T 

240  R( 31 , 1 ) =F( K , D+1 ) 
250  X( 31 , 1 ) =F( K , D+3) 
260   CALL  PEVALI 

27  0   NEXT  K 

28  0   NEXT   3  1 

290    IF   C=l    THEN  50 
400  RETURN 
450  END 

Table  A5.    Listing  for  RTEST 


STATV 


50   MAT  U=I-M 
60   FOR  K=1   TO  D 
70   U( K  ,  1 )  =  1 
80   NEXT  K 
90   MAT  S=INV(U) 
200  RETURN 
250  END 


Table  A6.    Listing  for  STATV 
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Appendix  II,    Control -Limit  Maintenance  Policies 
The  example  presented  in  Chapter  6  of  this  report  was  discussed 
only  in  terms  of  the  general  Markov  decision  problem.    However,  the 
equipment  described  there  has  a  characteristic  structure  that  permits 
use  of  a  special  method  to  select  a  minimum  cost  policy  from  those 
called  control -1 imit  maintenance  policies,  a  term  explained  below. 
Discussion  of  this  special  structure  is  relegated  to  an  appendix  because 
in  general  good  maintenance  policies  will  not  be  of  the  control-limit 
form. 

Recall  that  the  number  of  states  in  the  model  described  in  Chapter 
6  is  12.    One  action  is  available  in  each  of  the  states,  state  one  and 
state  twelve.    There  are  two  actions  available  in  each  of  the  other 
states.    Based  on  the  formula  of  Section  2.2  there  are  2^^  stationary 
policies  possible.    However,  the  argument  used  by  Derman^  shows  that  the 
set  of  control-limit  policies  contains  a  policy  optimal  over  the  set  of 
all  policies.    In  the  case  of  the  air  handling  unit  described  in  Chapter 
6  such  policies  are  of  the  form: 

for  friction  <  k"  WG,  do  not  change  the  filter 

for  friction  -  k"  WG,  change  the  filter. 
The  number  of  control -1 imit  policies  for  the  model  above  are  11  rather 
than  2^°. 

Because  the  cost  function  we  use  is  slightly  more  general  than  that 
used  by  Derman  we  reproduce  his  argument  with  the  necessary  adjustment. 
Let  M^(k)  denote  maintenance  cost  in  state  i  when  action  k  is  used  and 
P  •  Q^.  (k)  the  value  of  energy  used  in  state  i  when  action  k  is  applied. 
(This  is  the  notation  of  Section  5.1).    For  the  air  handling  unit  of 


Cyrus  Derman,  "An  Optimal  Replacement  Rules  When  Changes  of  State  are  Markovian 
pp.  201-210,  in  Mathematical  Optimization  Techniques,  edited  by  Richard  Bellman 
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Chapter  6: 

M^(k)  does  not  depend  on  i 

P  •  Q^.  (1)  is  non-decreasing  in  1 

P  •       (2)  does  not  depend  on  i 
as  can  be  verified  from  Table  ~l ^  p.  33. 

Under  action  1,  not  replacing  the  filter,  for  each  m  the  function 

(i)  -\m 

j=m 

with  domain  i  is  non-decreasing. 


Set  ^  (i  ,0)  =  0  for  i=l ,  .  .  .  ,11 

4.  (12,0)  =  Mi2(2). 

After  (j)  (i,N-l)  has  been  defined  procede  by  induction  to  define 

12 

*  (I.N)  =  P  .  Qi  (1)  +  (Al)  .    E  Plj  (I,  (j,N-}) 

j=l 

12 

(i,N)  =  min  {P  •  Q.  (1 )  +  (Al )  •     z  Pij  *  (j,N-l), 
'  j=l 

12 

M.  (2)  +  P  .  Q.  (2)  +  (Al)  .       EPlj  <J)  (j,N-l)} 
'  '  j=l 

for  i  =  2,  .  .       11  >  and  set 

12 

^  (12,N)  =         (2)  +  P  •  +  (Al )      Z^Plj  '  ^  ( J  ,N-1 ) 

Since  ^  (i,0)  is  non-decreasing  as  a  function  of  i  the  same  is  true  for 
every  nJ    (^(i)  =  lim    ^  (i,N)    is  also  non-decreasing  and  4)  (i)  satisfies 
the  functional  equation 

12 

((,  (1)  =  P  .  Q|  (1)  +  (Al)     z  Plj  (j)  (j) 


^See  Derman,  op.  cit.  Lemma  on  p.  207. 


12 

({)  (12)  =         (2)  +  P  •         (2)  +  (Al)      Z  Plj  ^  (j) 

12  ^ 

<p  (i)  =  min  {P  •  Q.  (1)  +  (Al)  •      E  Pij  (j), 

'  j  =  l 

12 

M.  (2)  +  P  .  Q.  (2)  +  (Al)  •      Z  Plj  <p  (j)  } 

This  establishes  that  some  optimal  policy  has  the  control -1 imit  form. 
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